31 research outputs found

    Data Analysis in Multimedia Quality Assessment: Revisiting the Statistical Tests

    Full text link
    Assessment of multimedia quality relies heavily on subjective assessment, and is typically done by human subjects in the form of preferences or continuous ratings. Such data is crucial for analysis of different multimedia processing algorithms as well as validation of objective (computational) methods for the said purpose. To that end, statistical testing provides a theoretical framework towards drawing meaningful inferences, and making well grounded conclusions and recommendations. While parametric tests (such as t test, ANOVA, and error estimates like confidence intervals) are popular and widely used in the community, there appears to be a certain degree of confusion in the application of such tests. Specifically, the assumption of normality and homogeneity of variance is often not well understood. Therefore, the main goal of this paper is to revisit them from a theoretical perspective and in the process provide useful insights into their practical implications. Experimental results on both simulated and real data are presented to support the arguments made. A software implementing the said recommendations is also made publicly available, in order to achieve the goal of reproducible research

    Saliency detection for stereoscopic images

    Get PDF
    International audienceSaliency detection techniques have been widely used in various 2D multimedia processing applications. Currently, the emerging applications of stereoscopic display require new saliency detection models for stereoscopic images. Different from saliency detection for 2D images, depth features have to be taken into account in saliency detection for stereoscopic images. In this paper, we propose a new stereoscopic saliency detection framework based on the feature contrast of color, intensity, texture, and depth. Four types of features including color, luminance, texture, and depth are extracted from DC-T coefficients to represent the energy for image patches. A Gaussian model of the spatial distance between image patches is adopted for the consideration of local and global contrast calculation. A new fusion method is designed to combine the feature maps for computing the final saliency map for stereoscopic images. Experimental results on a recent eye tracking database show the superior performance of the proposed method over other existing ones in saliency estimation for 3D images

    ROBUSTNESS AND PREDICTION ACCURACY OF MACHINE LEARNING FOR OBJECTIVE VISUAL QUALITY ASSESSMENT

    Get PDF
    Machine Learning (ML) is a powerful tool to support the development of objective visual quality assessment metrics, serving as a substitute model for the perceptual mechanisms acting in visual quality appreciation. Nevertheless, the reliability of ML-based techniques within objective quality assessment metrics is often questioned. In this study, the robustness of ML in supporting objective quality assessment is investigated, specifically when the feature set adopted for prediction is suboptimal. A Principal Component Regression based algorithm and a Feed Forward Neural Network are compared when pooling the Structural Similarity Index (SSIM) features perturbed with noise. The neural network adapts better with noise and intrinsically favours features according to their salient content

    Learning based signal quality assessment for multimedia communications

    No full text
    Multimedia contents (including image/video, speech, audio, graphic and so on) can be affected by a wide variety of distortions during the process of acquisition, compression, processing, transmission, and reproduction which generally leads to loss of perceptual quality. As a result, signal quality assessment is an important component in today’s multimedia communication systems. In this thesis, perceptual quality assessment algorithms are proposed for three important types of multimedia signals, namely image, video, and speech. This involves two crucial stages: (a) feature extraction/detection, and (b) feature pooling. The first stage calls for investigation and analysis into appropriate and effective signal features to extract meaningful information and provide a compact representation of the signal with the regard of quality. This is crucial because the selected features form the basis of the resultant quality metric. In this thesis, we discuss and provide detailed analysis of features based on Singular Value Decomposition, 2D mel-cepstrum and phase of Fourier Transform for visual quality assessment. We analyse the advantages and disadvantages of these features with regards to prediction accuracy and complexity. We also investigate into mel filter bank energies as features for evaluating quality of noisesuppressed speech and provide justification for their effectiveness via theoretical and experimental analysis. On the other hand, the second stage requires the determination of appropriate weights for fusing the features into a single score that can accurately reflect the human judgement of perceptual quality. We tackle this by using machine learning techniques which have been successfully employed in numerous research areas (for example in computer vision tasks such as object localization/tracking/recognition) but have not been adequately addressed in the literature within the realm of objective quality evaluation. Their major advantage is the introduction of a more systematic pooling methodology thereby avoiding unrealistic assumptions imposed in existing pooling methods. In this thesis, we demonstrate that machine learning can be effective in quality assessment if proper signal features are detected. We also provide insights into machine learning based feature pooling by analyzing the system trained on subjective scores which quantify human perception. The proposed algorithms have been validated on a large number of subjectively rated databases which are publicly available. We have performed careful experimental analysis (including within database and cross database tests) and demonstrated that the proposed schemes overall perform better than several relevant methods. The better alignment with human perception confirms the effectiveness of the algorithms proposed in this thesis.DOCTOR OF PHILOSOPHY (SCE

    SVD-based quality metric for image and video using machine learning

    No full text
    We study the use of machine learning for visual quality evaluation with comprehensive singular value decomposition (SVD)-based visual features. In this paper, the two-stage process and the relevant work in the existing visual quality metrics are first introduced followed by an in-depth analysis of SVD for visual quality assessment. Singular values and vectors form the selected features for visual quality assessment. Machine learning is then used for the feature pooling process and demonstrated to be effective. This is to address the limitations of the existing pooling techniques, like simple summation, averaging, Minkowski summation, etc., which tend to be ad hoc. We advocate machine learning for feature pooling because it is more systematic and data driven. The experiments show that the proposed method outperforms the eight existing relevant schemes. Extensive analysis and cross validation are performed with ten publicly available databases (eight for images with a total of 4042 test images and two for video with a total of 228 videos). We use all publicly accessible software and databases in this study, as well as making our own software public, to facilitate comparison in future research

    Scalable image quality assessment with 2D mel-cepstrumand machine learning approach

    No full text
    Measurement of image quality is of fundamental importance to numerous image and video processing applications. Objective image quality assessment (IQA) is a two-stage process comprising of the following: (a) extraction of important information and discarding the redundant one, (b) pooling the detected features using appropriate weights. These two stages are not easy to tackle due to the complex nature of the human visual system (HVS). In this paper, we first investigate image features based on two-dimensional (2D) mel-cepstrum for the purpose of IQA. It is shown that these features are effective since they can represent the structural information, which is crucial for IQA. Moreover, they are also beneficial in a reduced-reference scenario where only partial reference image information is used for quality assessment. We address the second issue by exploiting machine learning. In our opinion, the well established methodology of machine learning/pattern recognition has not been adequately used for IQA so far; we believe that it will be an effective tool for feature pooling since the required weights/ parameters can be determined in a more convincing way via training with the ground truth obtained according to subjective scores. This helps to overcome the limitations of the existing pooling methods, which tend to be over simplistic and lack theoretical justification. Therefore, we propose a new metric by formulating IQA as a pattern recognition problem. Extensive experiments conducted using six publicly available image databases (totally 3211 images with diverse distortions) and one video database (with 78 video sequences) demonstrate the effectiveness and efficiency of the proposed metric, in comparison with seven relevant existing metrics

    Lorentzian Based Adaptive Filters for Impulsive Noise Environments

    No full text

    Low-complexity video quality assessment using temporal quality variations

    No full text
    Objective video quality assessment (VQA) is the use of computational models to evaluate the video quality in line with the perception of the human visual system (HVS). It is challenging due to the underlying complexity, and the relatively limited understanding of the HVS and its intricate mechanisms. There are three important issues that arise in objective VQA in comparison with image quality assessment: 1) the temporal factors apart from the spatial ones also need to be considered, 2) the contribution of each factor (spatial and temporal) and their interaction to the overall video quality need to be determined, and 3) the computational complexity of the resultant method. In this paper, we seek to tackle the first issue by utilizing the worst case pooling strategy and the variations of spatial quality along the temporal axis with proper analysis and justification. The second issue is addressed by the use of machine learning; we believe this to be more convincing since the relationship between the factors and the overall quality is derived via training with substantial ground truth (i.e., subjective scores). Experiments conducted using publicly available video databases show the effectiveness of the proposed full-reference (FR) algorithm in comparison to the relevant existing VQA schemes. Focus has also been placed on demonstrating the robustness of the proposed method to new and untrained data. To that end, cross-database tests have been carried out to provide a proper perspective of the performance of proposed scheme as compared to other VQA methods. The third issue regarding the computational costs also plays a key role in determining the feasibility of a VQA scheme for practical deployment given the large amount of data that needs to be processed/analyzed in real time. A limitation of many existing VQA algorithms is their higher computational complexity. In contrast, the proposed scheme is more efficient due to its low complexity without jeopardizing the predict- on accuracy

    Image quality assessment based on gradient similarity

    No full text
    In this paper, we propose a new image quality assessment (IQA) scheme, with emphasis on gradient similarity. Gradients convey important visual information and are crucial to scene understanding. Using such information, structural and contrast changes can be effectively captured. Therefore, we use the gradient similarity to measure the change in contrast and structure in images. Apart from the structural/contrast changes, image quality is also affected by luminance changes, which must be also accounted for complete and more robust IQA. Hence, the proposed scheme considers both luminance and contrast-structural changes to effectively assess image quality. Furthermore, the proposed scheme is designed to follow the masking effect and visibility threshold more closely, i.e., the case when both masked and masking signals are small is more effectively tackled by the proposed scheme. Finally, the effects of the changes in luminance and contrast-structure are integrated via an adaptive method to obtain the overall image quality score. Extensive experiments conducted with six publicly available subject-rated databases (comprising of diverse images and distortion types) have confirmed the effectiveness, robustness, and efficiency of the proposed scheme in comparison with the relevant state-of-the-art schemes

    An automated approach for tone mapping operator parameter adjustment in security applications

    No full text
    International audienceHigh Dynamic Range (HDR) imaging has been gaining popularity in recent years. Different from the traditional low dynamic range (LDR), HDR content tends to be visually more appealing and realistic as it can represent the dynamic range of the visual stimuli present in the real world. As a result, more scene details can be faithfully reproduced. As a direct consequence, the visual quality tends to improve. HDR can be also directly exploited for new applications such as video surveillance and other security tasks. Since more scene details are available in HDR, it can help in identifying/tracking visual information which otherwise might be difficult with typical LDR content due to factors such as lack/excess of illumination, extreme contrast in the scene, etc. On the other hand, with HDR, there might be issues related to increased privacy intrusion. To display the HDR content on the regular screen, tone-mapping operators (TMO) are used. In this paper, we present the universal method for TMO parameters tuning, in order to maintain as many details as possible, which is desirable in security applications. The method's performance is verified on several TMOs by comparing the outcomes from tone-mapping with default and optimized parameters. The results suggest that the proposed approach preserves more information which could be of advantage for security surveillance but, on the other hand, makes us consider possible increase in privacy intrusion. © (2014) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only
    corecore